Picture for Zhenting Qi

Zhenting Qi

On the Generalization Gap in Self-Evolving Language Model Reasoning

Add code
May 31, 2026
Viaarxiv icon

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Add code
May 31, 2026
Viaarxiv icon

Self-Improving Language Models with Bidirectional Evolutionary Search

Add code
May 27, 2026
Viaarxiv icon

MoCo: A One-Stop Shop for Model Collaboration Research

Add code
Jan 29, 2026
Viaarxiv icon

DSGym: A Holistic Framework for Evaluating and Training Data Science Agents

Add code
Jan 22, 2026
Viaarxiv icon

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Add code
Dec 20, 2025
Viaarxiv icon

Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

Add code
May 29, 2025
Figure 1 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 2 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 3 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 4 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Viaarxiv icon

Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision

Add code
May 21, 2025
Viaarxiv icon

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Add code
May 19, 2025
Figure 1 for Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Figure 2 for Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Figure 3 for Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Figure 4 for Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Viaarxiv icon

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Add code
Mar 31, 2025
Viaarxiv icon